Modifications to handle force alignment, EOS placement, and OTF early interruption#4
Open
ankitapasad wants to merge 7 commits into
Open
Conversation
Signed-off-by: Ankita Pasad <apasad@nvidia.com>
…lled by data.fix_eos_placements=True, True by default. Signed-off-by: Ankita Pasad <apasad@nvidia.com>
Signed-off-by: Ankita Pasad <apasad@nvidia.com>
Signed-off-by: Ankita Pasad <apasad@nvidia.com>
Signed-off-by: Ankita Pasad <apasad@nvidia.com>
Signed-off-by: Ankita Pasad <apasad@nvidia.com>
Signed-off-by: Ankita Pasad <apasad@nvidia.com>
|
beep boop 🤖: 🚨 The following files must be fixed before merge! Your code was analyzed with PyLint. The following annotations have been identified: Thank you for improving NeMo's documentation! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
This PR adds the following features:
Turn off force alignment during validation, so as to (i) avoid wastage of resources and (ii) force alignment during validation often led to NCCL timeout error.
Agent EOS placement:
Replaces agent_supervision.end with a fixed offset relative to the user's speech. Specifically, the Agent's EOS is placed at user_turn_supervision.start +
eos_offset_frames;This decouples agent EOS from agent duration modeling. The EOS now serves as a dedicated barge-in trigger that signals exactly when the agent must yield to the user.
This resolves issues with our current training data mix where agent_supervision.end is inconsistent or prematurely timestamped before the user begins speaking. By anchoring EOS placement to the user's start, we provide the model with a stable, causal conditioning signal.
Note:
eos_offset_framesis hard coded to 8 inbuild_token_channelarguments but it can be made a parameter passed from config.Controlled by
data.fix_eos_placements=True, set toTrueby default.Early interruption is defined as user interrupting the agent during <text> token generation phase, and not the <pad> token generation phase;
Controlled by
data.early_interruption_prob(default: 0.0, i.e. turned off) anddata.early_interruption_overlap_tokens(default: 8, i.e., 640ms consistent to the EOS placement behavior);For randomly chosen conversations (based on
data.early_interruption_prob), an agent turn is selected at random to be early-interrupted. The user channel and agent channel are appropriately advanced and the conversation duration is truncated.Fraction of samples transformed by early interruption is logged as
early_interruption_successful_ratioin wandb;Turn off early interruption for validation data;
An option to turn off early interruption for specific datasets is provided. For example, datasets that already have user interruptions are not accurately handled by this on-the-fly logic. This can be done by passing
tags.otf_interruption=falsein the data yaml.Debugging-friendly features that save audio and audacity-format bos and eos labels. Only when
self.model_cfg.get("debug", False) == True. Can be removed later.model.use_numbers_norm, defaultTrue.Collection: speechlm2